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Case, B., "Philips Hopes to Displace DSPs with VLIW, TriMedia Processors Aimed at 
Future Multimedia Embedded Apps, " Microprocessor Report, Dec. 1994, pp. 12-18. 
Gwennap, L., "New PA-RISC Processor Decodes MPEG Video, H's PA-7100LC Uses New 
Instructions to Eliminate Decoder Chip," Microprocessor Report, Jan. 1994, pp. 16-17. 
TMS320c2X, User's Guide, Digital Signal Processing Products, Texas Instruments, 1993, 



2 of 3 



12/7/03 1:45 AM 



Record Display Form 



http://westbrs: 8002ftirVgatexxe?f^ 



pp. 3-2-3-11; 3-28-3-34;4-l-4-22/4-41;4-103; 4-119; 4-120; 4-122; 4-150; 4-151. 

i860 TM. Microprocessor Family Programmer's Reference Manual, Intel Corporation, 1992, 

Chapters 1, 3, 8 and 12. 

Lee, R.B., "Accelerating Multimedia with Enhanced Microprocessors," IEEE Micro, Apr. 
1995, pp. 22-32. 

Pentium Processor's User's Manual, vol. 3: Architecture and Programming Manual, Intel 
Corporation, 1993, Chapters 1, 3, 4, 6, 8, and 18. 

Margulis, N. , "i860 Microprocessor Architecture," McGrawHill, Inc., 1990, Chapters 6, 
7, 8, 10, and 11. 

Intel i750, i860 TM, i960 Processors and Related Products, 1993, pp. 1-3. 

Motorola MC88110 Second Generation RISC Microprocessor User's Manual, Motorola, Inc., 

1991. 

MC88110 Second Generation-RISC Microprocessor User's Manual, Motorola, Inc., Sep. 
1992, pp. 2-1 through 2-22, 3-1 through 3-32, 5-1 through 5-25, 10-62 through 10-71 , 
Index 1 through 17 . 

Errata to MC88110 Second Generation RISC Microprocessor User's Manual, Motorola, Inc., 
1992, pp. 1-11. 
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ART-UNIT: 2183 

PRIMARY -EXAMINER: Kim; Kenneth S. 

ATTY-AGENT-FIRM: Blakely, Sokoloff, Taylor & Zafman LLP 
ABSTRACT : 

An apparatus includes an instruction decoder, first and second source registers and a 
circuit coupled to the decoder to receive packed data from the source registers and to 
unpack the packed data responsive to an unpack instruction received by the decoder. A 
first packed data element and a third packed data element are received from the first 
source register. A second packed data element and a fourth packed data element are 
received from the second source register. The circuit copies the packed data elements 
into a destination register resulting with the second packed data element adjacent to 
the first packed data element, the third packed data element adjacent to the second 
packed data element, and the fourth packed data element adjacent to the third packed 
data element. 

18 Claims, 18 Drawing figures 
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DOCUMENT- IDENTIFIER: US 6516406 Bl 

TITLE: Processor executing unpack instruction to interleave data elements from two 
packed data 

Abstract Text (1) : 

An apparatus includes an instruction decoder, first and second source registers and a 
circuit coupled to the decoder to receive packed data from the source registers and to 
unpack the packed data responsive to an unpack instruction received by the decoder. A 
first packed data element and a third packed data element are received from the first 
source register. A second packed data element and a fourth packed data element are 
received from the second source register. The circuit copies the packed data elements 
into a destination register resulting with the second packed data element adjacent to 
the first packed data element, the third packed data element adjacent to the second 
packed data element, and the fourth packed data element adjacent to the third packed 
data element . 

Assignee Name (1) : 
Intel Corporation 

Brief Summary Text (3) : 

The present invention includes an apparatus and method of performing operations using 
a single control signal to manipulate multiple data elements. The present invention 
allows execution of move, pack and unpack operations on packed data types. 

Brief Summary Text (11) : 

A processor. The processor includes a first register for storing a first packed data, 
a decoder, and a functional unit. The decoder has a control signal input. The control 
signal input is for receiving a first control signal and a second control signal. The 
first control signal is for indicating a pack operation. The second control signal is 
for indicating an unpack operation. The functional unit is coupled to the decoder and 
the register. The functional unit is for performing the pack operation and the unpack 
operation using the first packed data. The processor also supports a move operation. 

Drawing Description Text (15) : 

FIG. 9 illustrates one embodiment of a method followed by a processor when performing 
an unpack operation on packed data. 

Drawing Description Text (16) : 

FIG. 10 illustrates a circuit capable of implementing an unpack operation on packed 
data . 

Detailed Description Text (3) : 

A processor having move, pack, and unpack operations that operate on multiple data 
elements is described. In the following description, numerous specific details are set 
forth such as circuits, etc., in order to provide a thorough understanding of the 
present invention. In other instances, well-known structures and techniques have not 
been shown in detail in order not to unnecessarily obscure the present invention. 

Detailed Description Text (19) : 

FIG. 3 illustrates the general operation of processor 109. That is, FIG. 3 illustrates 
the steps followed by processor 109 while performing an operation on packed data, 
performing an operation on unpacked data, or performing some other operation. For 
example, such operations include a load operation to load a register in register file 
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2 04 with data from cache 2 06, main memory 104, read only memory (ROM) 106, or data 
storage device 107. In one embodiment of the present invention, processor 109 supports 
most of the instructions supported by the Intel 80486. TM. , available from Intel 
Corporation of Santa Clara, Calif. In another embodiment of the present invention, 
processor 109 supports all the operations supported by the Intel 80486. TM., available 
from Intel Corporation of Santa Clara, Calif. In another embodiment of the present 
invention, processor 109 supports all the operations supported by the Pentium. TM. 
processor, the Intel 80486. TM. processor, the 80386. TM. processor, the Intel 80286. TM. 
processor, and the Intel 8086. TM . processor, all available from Intel Corporation of 
Santa Clara, Calif. In another embodiment of the present invention, processor 109 
supports all the operations supported in the IA. TM. - -Intel Architecture, as defined by 
Intel Corporation of Santa Clara, Calif, (see Microprocessors. Intel Data Books volume 
1 and volume 2, 1992 and 1993, available from Intel of Santa Clara, Calif.). 
Generally, processor 109 can support the present instruction set for the Pentium. TM. 
processor, but can also be modified to incorporate future instructions, as well as 
those described herein. What is important is that general processor 109 can support 
previously used operations in addition to the operations described herein. 

Detailed Description Text (51) : 

In one embodiment of the present invention, the performance of multimedia applications 
is improved by not only supporting a standard CISC instruction set ( unpacked data 
operations) , but by supporting operations on packed data. Such packed data operations 
can include an addition, a subtraction, a multiplication, a compare, a shift, an AND, 
and an XOR. However, to take full advantage of these operations, it has been 
determined that data manipulation operations should be included. Such data 
manipulation operations can include a move, a pack, and an unpack . Move, pack and 
unpack facilitate the execution of the other operations by generating packed data in 
formats that allow for easier use by programmers. 

Detailed Description Text (83) : 

In one embodiment, an unpack operation interleaves the low order packed bytes, words 
or doublewords of two source packed data to generate result packed bytes, words, or 
doublewords . 

Detailed Description Text (84) : 

FIG. 9 illustrates one embodiment of a method of performing an unpack operation on 
packed data. This embodiment can be implemented in the processor 109 of FIG. 2. 

Detailed Description Text (96) : 

In one embodiment of the present invention, to achieve efficient execution of the 
unpack operation parallelism is used. FIG. 10 illustrates one embodiment of a circuit 
that can perform an unpack operation on packed data. 

Detailed Description Text (104) : 

Therefore, the move, pack and unpack operations can manipulate multiple data elements. 
In prior art processors, to perform these types of manipulations, multiple separate 
operations would be needed to perform a single packed move, pack or unpack operation. 
The data lines for the packed data operations, in one embodiment, all carry relevant 
data. This leads to a higher performance computer system. 

CLAIMS : 

1. An apparatus comprising: a instruction decoder to receive an unpack instruction; a 
first source register to hold a first packed data having a first plurality of packed 
data elements including a first packed data element and a third packed data element; a 
second source register to hold a second packed data having a second plurality of 
packed data elements including a second packed data element and a fourth packed data 
element; a destination register to hold a third packed data; a circuit coupled to the 
decoder to receive the first packed data from the first source register and the second 
packed data from the second source register and to unpack the first packed data and 
the second packed data responsive to the unpack instruction by copying the first 
packed data element into the destination register, copying the second packed data 
element into the destination register adjacent to the first packed data element, 
copying the third packed data element into the destination register adjacent to the 
second packed data element, and copying the fourth packed data element into the 
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destination register adjacent to the third packed data element. 

7. The apparatus of claim 2 wherein the decoder further decodes the unpack 
instruction, a first byte and a second byte of the three bytes comprising an operation 
code specifying an unpack operation to interleave low order packed elements from the 
first and second packed data, the elements selected from the group consisting of byte 
elements, word elements and doubleword elements. 

12 . The apparatus of claim 1 wherein the first packed data element is a low order data 
element of the first packed data and the second packed data element is a low order 
data element of the second packed data and the unpack instruction comprises an opcode 
field to contain one of a set of operation codes to specify an unpack operation 
interleaving low order data elements from the first and the second pluralities of 
packed data elements, the opcode field specifying data elements selected from the 
group consisting of byte elements, word elements and doubleword elements. 

14 . The apparatus of claim 1 wherein the first packed data element is a high order 
data element of the first packed data and the second packed data element is a high 
order data element of the second packed data and the unpack instruction comprises an 
opcode field to contain one of a set of operation codes to specify an unpack operation 
interleaving high order data elements from the first and the second pluralities of 
packed data elements, the opcode field specifying data elements selected from the 
group consisting of byte elements, word elements and doubleword elements. 

16. A digital processing apparatus comprising: a decoder to receive an unpack control 
signal having an Intel integer opcode format comprising three or more bytes, a third 
byte of the three or more bytes permitting a first three-bit source register address 
and a second three-bit source-destination register address; a first register to hold a 
first packed data having a first plurality of packed data elements including a first 
packed data element and a third packed data element, the first register corresponding 
to the first three-bit source register address; a second register to hold a second 
packed data having a second plurality of packed data elements including a second 
packed data element and a fourth packed data element, the second register 
corresponding to the second three-bit source-destination register address; a circuit 
to receive the first packed data from the first register and the second packed data 
from the second register, and in response to the unpack control signal, to copy the 
first packed data element into the second register, copy the second packed data 
element into the second register adjacent to the first packed data element, copy the 
third packed data element into the second register adjacent to the second packed data 
element, and copy the fourth packed data element into the second register adjacent to 
the third packed data element. 

17. The digital processing apparatus recited in claim 16 wherein the decoder is 
further to receive the unpack control signal having an Intel integer opcode format as 
described in the "Pentium. RTM . Processor Family User's Manual," the Intel integer 
opcode format comprising three or more bytes, a first byte and a second byte of the 
three or more bytes permitting an operation code to specify an unpack operation 
interleaving low order packed byte elements, word elements or doubleword elements from 
the first and second packed data; 

18. A computer system comprising: a memory to hold an unpack instruction having an 
Intel integer opcode format comprising three or more bytes, one of the three or more 
bytes permitting a first three-bit source register address and a second three-bit 
source-destination register address; a storage device to hold software, the software 
configured to supply the unpack instruction to the memory for execution; a processor 
enabled to receive and decode the unpack instruction from the memory, the processor 
including: a first register corresponding to the first three-bit source register 
address to hold a first packed data having a first plurality of packed data elements 
including a first packed data element and a third packed data element, a second 
register corresponding to the second three-bit source-destination register address to 
hold a second packed data having a second plurality of packed data elements including 
a second packed data element and a fourth packed data element, and a circuit to 
receive the first packed data from the first register and the second packed data from 
the second register and to copy the first packed data element into the second 
register, copy the second packed data element into the second register adjacent to the 
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first packed data element, copy the third packed data element into the second register 
adjacent to the second packed data element, and copy the fourth packed data element 
into the second register adjacent to the third packed data element. 
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TITLE: Microprocessor capable of unpacking packed data in response to a unpack 
instruction 



Abstract Text (1) : 

A microprocessor capable of unpacking packed data in response to an unpack 
instruction. The microprocessor having a a storage area to store a first packed data 
and a second packed data respectively including a first plurality of data elements and 
a second plurality of data elements, wherein each data element in the first plurality 
of data elements corresponds to a different data element in the second plurality of 
data elements, in a respective position. The microprocessor also includes a circuit 
that simultaneously copies less than all data elements from the first plurality of 
data elements and corresponding data elements from the second plurality of data 
elements into a storage area as a third plurality of separate data elements in a third 
packed data in response to the unpack instruction. 

Assignee Name (1) : 
Intel Corporation 

Brief Summary Text (3) : 

The present invention includes an apparatus and method of performing operations using 
a single control signal to manipulate multiple data elements. The present invention 
allows execution of move, pack and unpack operations on packed data types. 

Brief Summary Text (11) : 

A processor. The processor includes a first register for storing a first packed data, 
a decoder, and a functional unit. The decoder has a control signal input. The control 
signal input is for receiving a first control signal and a second control signal. The 
first control signal is for indicating a pack operation. The second control signal is 
for indicating an unpack operation. The functional unit is coupled to the decoder and 
the register. The functional unit is for performing the pack operation and the unpack 
operation using the first packed data. The processor also supports a move operation. 

Drawing Description Text (15) : 

FIG. 9 illustrates on embodiment of a method followed by a processor when performing 
an unpack operation on packed data. 

Drawing Description Text (16) : 

FIG. 10 illustrates a circuit capable of implementing an unpack operation on packed 
data. 

Detailed Description Text (3) : 

A processor having move, pack, and unpack operations that operate on multiple data 
elements is described. In the following description, numerous specific details are set 
forth such as circuits, etc., in order 

Detailed Description Text (20) : 

FIG. 3 illustrates the general operation of processor 109. That is, FIG. 3 illustrates 
the steps followed by processor 109 while performing an operation on packed data, 
performing an operation on unpacked data, or performing some other operation. For 
example, such operations include a load operation to load a register in register file 
204 with data from cache 206, main memory 104, read only memory (ROM) 106, or data 
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storage device 107. In one embodiment of the present invention, processor 109 supports 
most of the instructions supported by the Intel 80486. TM. , available from Intel 
Corporation of Santa Clara, Calif. In another embodiment of the present invention, 
processor 109 supports all the operations supported by the Intel 80486. TM., available 
from Intel Corporation of Santa Clara, Calif. In another embodiment of the present 
invention, processor 109 supports all the operations supported by the Pentium. TM. 
processor, the Intel 80486. TM. processor, the 80386. TM. processor, the Intel 80286. TM. 
processor, and the Intel 8086. TM. processor, all available from Intel Corporation of 
Santa Clara, Calif. In another embodiment of the present invention, processor 109 
supports all the operations supported in the IA. TM .- -Intel Architecture, as defined by 
Intel Corporation of Santa Clara, Calif, (see Microprocessors, Intel Data Books volume 
1 and volume 2, 1992 and 1993, available from Intel of Santa Clara, Calif.). 
Generally, processor 109 can support the present instruction set for the Pentiuml98 
processor, but can also be modified to incorporate future instructions, as well as 
those described herein. What is important is that general processor 109 can support 
previously used operations in addition to the operations described herein. 

Detailed Description Text (53) : 

In one embodiment of the present invention, the performance of multimedia applications 
is improved by not only supporting a standard CISC instruction set ( unpacked data 
operations) , but by supporting operations on packed data. Such packed data operations 
can include an addition, a subtraction, a multiplication, a compare, a shift, an AND, 
and an XOR. However, to take full advantage of these operations, it has been 
determined that data manipulation operations should be included. Such data 
manipulation operations can include a move, a pack, and an unpack . Move, pack and 
unpack facilitate the execution of the other operations by generating packed data in 
formats that allow for easier use by programmers. 

Detailed Description Text (86) : 

In one embodiment, an unpack operation interleaves the low order packed bytes, words 
or doublewords of two source packed data to generate result packed bytes, words, or 
doublewords . 

Detailed Description Text (87) : 

FIG. 9 illustrates one embodiment of a method of performing an unpack operation on 
packed data. This embodiment can be implemented in the processor 109 of FIG. 2. 

Detailed Description Text (100) : 

In one embodiment of the present invention, to achieve efficient execution of the 
unpack operation parallelism is used. FIG. 10 illustrates one embodiment of a circuit 
that can perform an unpack operation on packed data. 

Detailed Description Text (108) : 

Therefore, the move pack and unpack operations can manipulate multiple data elements. 
In prior art processors, to perform these types of manipulations, multiple separate 
operations would be needed to perform a single packed move, pack or unpack operation. 
The data lines for the packed data operations, in one embodiment, all carry relevant 
data. This leads to a higher performance computer system. 
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A microprocessor capable of unpacking packed data in response to an unpack 
instruction. The microprocessor having a a storage area to store a first packed data 
and a second packed data respectively including a first plurality of data elements and 
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